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Abstract 



o 

We give a general proof of convergence for the Alternating Direction Method of Multipliers 
(ADMM). ADMM is an optimization algorithm that has recently become very popular due to 
its capabilities to solve large-scale and/or distributed problems. We prove that the sequence 
generated by ADMM converges to an optimal primal-dual optimal solution. We assume the 
functions / and g, defining the cost f(x) + g(y), are real-valued, but constrained to lie on 
, polyhedral sets X and Y. Our proof is an extension of the proofs from [3] [4]. 

q; 

^ '. 1 Introduction 

•t-> 

The Alternating Direction Method of Multipliers (ADMM), first proposed in the seventies by [5J[7], 
is a versatile algorithm that is well-suited to solve large-scale or distributed problems. Lately, 
ADMM has become very popular because it efficiently handles problems that cannot be solved by 
the conventional interior point methods. Although ADMM and some of its variations have been 
extensively studied (e.g., (5j [6l [3]), there are still some theoretical aspects left to explore. As an 
example, only very recently it has been proved that ADMM has an 0(1/ k) convergence rate [5]. 
We refer to the very complete survey [I] for recent applications, history, and extensions of ADMM. 
■ ADMM solves the following optimization problem: 

minimize f(%) + g(y) 

subject to x € X, y G Y (1) 
Ax + By = c , 

where the variable is (x, y) G R" 1 x R™ 2 , and / : R™ 1 -> K, g : R™ 2 — > R are given convex functions, 
A G R mxni , B G M mx " 2 are two given matrices, X and Y are convex sets, and c is a constant. 
H ' The augmented Lagrangian of (p} is 



L p (x, y; A) = f(x) + g(y) + X T (Ax + By - c) + ^\\Ax + By - c" 2 



where p > is a predefined positive parameter, and A G R m is the dual variable associated to the 
constraint Ax + By = c. ADMM solves ([T]) by concatenating the method of multipliers with one 
iteration of the nonlinear Gauss-Seidel algorithm [3J, i.e., it iterates the following equations on k: 

x k+1 G &rgmm xeX L p (x,y k ; X k ) 

y k+1 G argmm yeY L p {x k+1 ,y;X k ) . (2) 
A fc+i = \k + p (Ax k+1 + By k+1 - c) 

In words, the augmented Lagrangian L p is first minimized with respect to (w.r.t.) x, keeping y 
and A fixed at y k and A fc , respectively. Then, L p is minimized w.r.t. y, but x is fixed at the new 
value x k+1 (and A is fixed at A fe ). Finally, the dual variable A is updated in a gradient ascent way. 
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There are many variations of ADMM and also many variations of the proof of its convergence. 
Here, we use the techniques of [3J|3] to prove a version of ADMM that applies to the case where X 
and Y are polyhedral, the functions / and g are real-valued and convex, and the matrices A and B 
have full column-rank. We prove that the sequence {(x k ,y k , X k )} generated by ([2]) has a single 
limit point and that this limit point is a primal-dual solution of ([1} or, in other words, a saddle 
point of the augmented Lagrangian L p . 

2 Proof of Convergence 

We aim to prove the following theorem. 

Theorem 1 (Convergence of ADMM) Assume: 

1. / : l" 1 — > M and g : W 12 — > M. are convex functions over M. ni and R™ 2 , respectively 

2. X C K" 1 and Y C K" 2 are polyhedral sets 

3. Problem ((T|) is solvable (and denote its optimal objective by p*) 

4. Matrices A and B have full column-rank 
Then, 

1. f(x k )+g(y k )^p* 

2. {(x k ,y k )} has a single limit point (x*,y*); furthermore, (x*,y*) solves (JTJ) 

3. {A fc } has a unique limit point A*: furthermore, A* solves the dual problem of ([T]): 

maximize F(X) + G(X) — A T c , . 

A ' [6) 

where F(X) = m£ xeX {f{x) + X T Ax) and G(A) = w£ veY (g(y) + X T By) 

We present a proof for this theorem that is based on the proofs from [4J and [3, Prop. 4. 2]. 
While [3] does not make assumption 4 and consequently does not prove claim 2, [3] proves all our 
claims by assuming that B is the identity matrix. So, Theorem [T] generalizes the proof from [3] 
by considering more general matrices B, and it also generalizes the proof of [4j by introducing 
assumption 4 and proving claim 2. Note that, although our assumptions on / and g are more 
restrictive than the ones in [5] , Theorem Q] can be straightforwardly adapted to the / and g 
assumptions made in [1]. 

Before proving Theorem [Tl we need the following lemma. 



Lemma 1 Let 


<p and ijj be two convex functions from K" 


to M 


, and let tp be differentiable in M" . 


Also, let X C 1 


™ be a closed convex set. Then, 






x* e arj 


jmin 4>(x) + ip(x) <^=> x* S arg 


min 


cj>(x)+Vip(x*) T (x-x*). 









Proof The optimality conditions [21 Prop.4.7.2] for the left-hand side are: there exists d e 
d((j){x*) + ip(x*)) such that d T (x - x*) > for all x e X. Since d(<p(x*) + ip(x*)) = d(<j>(x*)) + 
Vip(x*), this coincides with the optimality conditions for the right-hand side. □ 

Now we are in conditions to prove Theorem [TJ 
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Proof First, note that assumptions 1-3 make proposition 5.2.1 of [lj applicable, which says that 
strong duality holds: there is no duality gap for (JXJ) and and the dual problem ([3]) is solvable. 
The pair (x* , y*) will denote any primal solution (there exists at least one by assumption 3) and A* 
will denote any dual solution (there exists at least one by strong duality). We will also use the 
following notation: 

P k = f(x k ) + g{y k ) , r k = Ax k +By k - c . 
The proof consists of showing that the following inequalities hold (see the proofs below) : 

p*-p k+1 < A*V +1 (4) 

p k+l _ p * < _ (A fc+l ) T r fc+l _ p^yk+l _ yk))T^ y k+l _ y *j _ r fc+l) (5) 

V k+l <V k - p\\r k+1 \\* - p\\B{y k + l -y k )\\\ (6) 
where V k is the Lyapunov function 

V k :=i||A*-Al 2 + p||S(i,*-tf*)|| 2 . 
P 

Once these inequalities are proven, point 1 of the theorem is proven due to the following. From ©, 
yk <- yo ^ meanm g that A fc and By k are bounded. Furthermore, 

oo 

pJ2(\\r k+1 \\ 2 + \\B(y k ^-y k )f)<v°. 

k=0 

aince V° is finite, r k -> and B{y k+1 - y k ) as k -+ oo. These facts together with the fact 
that By k is bounded, imply that p k — > p* (since the right-hand side of both @ and ([5]) converge 
to zero). This proves claim 1. 

We now prove that the sequence {(x k ,y k )} is bounded, which implies that it has limit points. 
To see that, note that {By k } is bounded and B has full column-rank (assumption 4), thus {y k } is 
also bounded. Also, r k = Ax k + By k — c -+ 0, which implies that {Ax fc } is bounded; again, A has 
full column-rank, meaning that {x k } is bounded. 

Given that we know that the sequence {(x k ,y k )} has limit points, we now observe that any 
of its limit points, say (x,y), is primal optimal. In fact, this limit point is optimal because p k = 
f(x k ) + g(y k ) — > p* and thus any subsequence of p k also converges to p*. Also, any subsequence 
of r k = Ax k + By k — c converges to and, together with the fact that the sets X and Y are closed, 
this implies that (x,y) is feasible in ([T]). 

Note that, although any limit point of {{x k ,y k )} is primal optimal, this sequence may not 
even converge. This is not the case because of assumption 4. We will prove this after proving 
inequalities ((I])-© and that any limit point of {X k } is dual optimal (note that {X k } is bounded 
and hence it has limit points). 



Proof of f4]). We have seen that strong duality holds for the pair (JXJ) , ((3]). Thus, (x* ,y* , A*) 
satisfies the KKT conditions. In particular, 

(x*,y*) £ arg min f(x) + g(y) + X* T (Ax + By - c) , 

x£X,y£Y 

which implies 

fix*) + g(y*) +\* T (Ax* + By* - c) < f(x k+1 ) + g(y k+1 ) +A* 7 ' (Ax k+1 + By k+1 - c) , 

v ' " v ' ■> v ' " v ' 

p* =0 pk+l r k+l 

or 

p*-p k+1 <\* T r k+1 . 
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Proof of ((5|). To prove ([5]), we start by working the optimization problems denning x k+1 and y k+1 
in ©• 

x k+1 e argminL p (x,y fc ; A fc ) 

= argmin/(a;) + g(y k ) + A feT (Ax + Sy fe - c) + ^\\Ax + By k - c|| 2 
xex 2 



and using Lemma Q] 



argmin /(» + (A T X k + pA T (Ax k+1 + By k - c)) T (x - x k+1 ) 
argmin f{x) + (A T \ k + pA T (Ax k+1 + By k - c)) T x 

xeX 

argmin /(x) + (X k + p(Ax k+1 + By k - c)) T Ax 

xeX 



and since A fe+1 = A fe + pr k+1 



argmin /(x) + (\ k+1 - pB{y k+1 - y k )) 1 Ax (7) 

xeX 



Using the same reasoning, 



y k+1 £ &rgmmL p (x k+ \y;\ k ) 

= argmin/^ 4 " 1 ) + g(y) + A feT (Ax k+1 + By - c) + -\\Ax k+1 + By - c|| 2 

y£Y 2 

= argmin ff (2/) + {B T X k + pB T (Ax k+1 + By k+1 - c)) T (y - y k+1 ) 

= argming(y) + (A fe + p(Ax k+1 + By k+1 - c)) T By 
y eY 

and since A fc+1 = A fe + pr k+1 

= argmmg(y) + \ k+lT By (8) 

yeY 

Now we apply ([7]) and © to x k+1 and x*, and to y k+1 and y*, respectively. We have 

f(x k+1 ) + (X k+1 - P B(y k+1 - y k )) T Ax k+1 < f(x*) + (A fe+1 - pB(y k+1 - y k )) T Ax* (9) 

and 

g(y k+1 ) + X k+lT By k+1 < g(y*) + X k+lT By* . (10) 
Summing up © and (fTU|) we get 

p k+i + x k + lT (Ax k + 1 + By k+1 ) - p(B(y k+1 - y k )) T Ax k+1 < p* + X k+lT ( Ax* + By*) - p(B(y k+1 - y k )) T Ax* 

— C 

<^> p k+1 -p* < _\k+i T r k+i _ p (B{y k+1 - y k )) T A(x* - x k+1 ) 

and since r k+1 = Ax k+1 +By k+1 -c and c = Ax*+By*, we have r k+1 = A(x k+1 —x*)+B(y k+1 —y*), 
thus 

<=> P fe+1 - < -A^ 1 V+ 1 - ^(y^ 1 - y k )) T {B(y^ - y*) r^ 1 ) , 
which is inequality ([3]). 
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Proof of ([6]). We concatenate flD and ©: 

- A* V +1 < -(X k+1 ) T r k+1 - p(B{y k+1 - y k )) T (B{y k+1 - y*) - r k+1 ) 

<=> (A fc+1 - A*) T r fc+1 - p(B(y k+1 - y k )) T r k+1 + p(B(y k+1 - y k )) T (B(y k+1 - y*)) < 

<S=> 2(A fe+1 - A*) T r fc+1 - 2p(B{y k+1 - y k )) T r k+1 + 2p{B(y k+1 - y k )) T (B{y k+1 - y*)) < . 



(11) 



All we have to do now is to manipulate (fTTj) in order to get ©. Taking into account that X k+1 
X k + pr k+1 , the first term of (fTTj) becomes: 

2(A fe+1 - A*) T r fe+1 = 2(A fc - \*) T r k+1 + 2p||r fe+1 || 2 

= 2(A fe - A*) T r fc+1 + p\\r k+1 \\ 2 + p\\r k+1 1| 2 

and replacing r k+1 = ^{X k+1 - X k ), 

= E( A * - X*) T (X k+1 - A fe ) + i||A fe+1 - A fc || 2 + p||r fe+1 || 2 
P P 

and completing the square of the first two terms, 

= -\\X k - A* + A fe+1 - A fe || 2 - — II - A* II 2 + p\\r k+1 \\ 2 
P P 



-(||A' £+1 -A*|| 2 -||A fc -A*|| 2 ) + p\\r k+1 \\ 2 . (12) 



Insert (QJ]) in dTTJ): 



- p - A*|| 2 - ||A fc - A*|| 2 ) + p\\r k+l f - 2p(B(y k+1 - y k )) 



r k+l 



-Ei 

+ 2p(B(y k+1 - y k )) T B(y k+1 - „*) < . (13) 
Completing the square of E\ , the last two terms of (fT3"|) become 

p\\r k+1 B(y k+1 y k )f p\\B{y k+1 y k )f + 2p(B(y k+1 y k )) J B(y k+1 - y*) 

E 2 

and completing the square of E2, 

=p\\r k+1 B(y k+1 y k )f p(\\B(y k+1 - „*) - B(y k+1 y*)\\ 2 - \\B(y k+1 ~ y*)|| 2 ) 

=p\\r k+1 B(y k+1 y k )f + p(\\B(y k+1 - y*)\\ 2 - \\B(y k ~ y*)" 2 " 
Replacing this in (Tf3|, 

i(||A fc+1 - A*|| 2 - \\\ k Al 2 ) + p\\r k+1 B(y k+1 y k )f + p(\\B{y k+1 y*)|| 2 - \\B(y k - ^)|| 2 ) < 

(I|[ A *+i - A*|| 2 + p\\B(y k+1 - y*)|| 2 ) - (i||A fc - A*|| 2 + p\\B(y k y*)\\ 2 ) < -p\\r k+l B(y k+1 y k )\\ 2 
P P 

— yk + 1 —yk 

^ V k - V k+1 > p\\r k+1 - B(y k+1 - y k )f 

<^ V k - V k+1 > p||r fc+1 || 2 + p\\B{y k+1 - y k )\\ 2 - 2p{B(y k+1 - y k )) T r k+1 . (14) 
What is left to prove is that -2p(B(y k+1 - y k )) T r k+1 > 0. This is derived from ([TD|): 

g{y k+1 ) + X k+lT By k+1 < g(y k ) + X k+lT By k (15) 
g(y k ) + X kT By k < g(y k+1 ) + A fcT By k+1 (16) 
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Adding (pj and ^M, 

(A fe+1 - X k ) T By k+1 < (X k+1 - \ k ) T By k <^=> (A fe+1 - X k ) T B(y k+1 - y k ) < 

<^ pr k+lT B(y k+1 -y k ) < 
^ -2p(B( y fc+1 - /)) T r fc+1 >0. 

Therefore, from (TlH) we get ([5]): 

y*= _ yfc+1 > p || r fe+l||2 + - y fe )|| 2 . 



Proof that any limit point of {A fe } is dual optimal. We had seen that {A fe } is bounded and 
thus it has limit points. Let A be any limit point and let K, C N be the set of indices that yield 
that limit point, i.e., {X k }keK — ► A. We will show that A is dual optimal. 

Let (x, y) be a limit point of {(x k ,y k )}keK- We had seen that (x, y) existed (because {(x k ,y k )}, 
and thus {(x k , y k )}keK, is bounded) and that it was primal optimal. From now on, we will assume 
that {(x k , y k )}keK — ► y) (since (x,y) is just a limit point of {(x k , y k )}keK, if necessary, take a 
subsequence of IC). 
Now, define 

' y" 



,Y ,Y P B(y k 1 1 ' 



and note that, because B(y_ k - y k 1 ) -)■ implies {B(y k - y k ^jkeK. -> 0, {X k } ke K and {X k } ke K 
have the same limit point A. From the definition of F(X) and equation ([7]), 

F(X k ) = inf f(x) + (X k ) T (Ax) 

= f(x k ) + (X k ) T Ax k (17) 
<f(x) + (X k ) T Ax, V xex . (18) 

Similarly, from the definition of G(A) and ([5]), 

G(X k )= inf g(y) + (X k ) T By 

= g(y k ) + (X k ) T By k (19) 
< g(y) + (X k ) T By , V yeY . (20) 

Adding equations (JTTJ) and ([T9")) and taking the limit k — > +oo (k G IC) on both sides, 

lim (F(X k ) + G(X k ))=f(x)+g(y) + X T (Ax + By)=p* + X T c = L(X*) + X T c, (21) 

k — ^+oo 
keK 

where the second-to- last equality follows from optimality and primal feasibility of (x,y), and the last 
equality follows from strong duality. Adding equations (fT8"]) and (|2"0)) and taking the limit k — > +oo 
(k 6 IC) on both sides, 

lim (F(X k )+G(X k ))<f(x)+g(y) + X T (Ax + By), V l£X V sg y. 

k — ?+oo 
keK. 

In particular, we can take the infimum on the right-hand side: 

lim (F(X k ) + G(X k )) < F(X) + G(X) . (22) 

k — ^+oo 
keK. 

Inequality (12"2"|) and equation (|2"Tj) yield 

F(A) + G(A)-A T c>L(A*), 
showing that A solves ©, being dual optimal. 
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Proof of convergence of the sequence {(x k , y k , A fc )} generated by ADMM. Let (x,y, A) 
be any limit point of the sequence {(x k ,y k , X k )} generated by ADMM. We have proved that (x, y) 
is primal optimal and A is dual optimal. Now we prove that {(x k ,y k , X k )} has a unique limit point, 
being convergent. 

Since (x,y,X) is a primal-dual solution, the inequalities (U])-© hold with (x* ,y* , A*) replaced 
by (x, y, A). And, if we had seen that V k in ^ was convergent (because it was bounded and 
non- increasing), now we see that its limit is if we use (x, y, A) as a solution. This implies 
that both A fc — > A and B{y k — y) — > 0. Since B has full column-rank, we also have y k — > y. 
Since r k — c — A(x k — x) + B(y k — y) — > and A has full column-rank, also x k — > x. 

This shows that all the sequences x k , y k , and X k , produced by ADMM, converge. 

□ 
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